TRANSACTIONS ON BIG DATA 1 A Distributed
نویسندگان
چکیده
Java 8 has introduced new capabilities such as lambda expressions and streams which simplify data-parallel computing. However, as a base language for Big Data systems, it still lacks a number of important capabilities such as processing very large datasets and distributing the computation over multiple machines. This paper gives an overview of the Java 8 Streams API and proposes extensions to allow its use in Big Data systems. It also shows how the API can be used to implement a range of standard Big Data paradigms. Finally, it compares performance with that of Hadoop and Spark. Despite being a proof-of-concept implementation, results indicate that it is a lightweight and efficient framework, comparable in performance to Hadoop and Spark, and is up to 5 times faster for the largest input sizes tested.
منابع مشابه
Optimization of majority protocol for controlling transactions concurrency in distributed databases by multi-agent systems
In this paper, we propose a new concurrency control algorithm based on multi-agent systems which is an extension of majority protocol. Then, we suggest a clustering approach to get better results in reliability, decreasing message passing and algorithm’s runtime. Here, we consider n different transactions working on non-conflict data items. Considering execution efficiency of some different...
متن کاملOpportunities in Big Data Management and Processing
Every day we witness new forms of data in various formats. Some example include structured data from transactions we make, unstructured data as text communications of different kinds, varieties of multimedia files and video streams. To ensure efficient processing of this data, often called ‘Big Data’, the use of highly distributed and scalable systems and new data management architectures, e.g....
متن کاملParallel Rule Mining with Dynamic Data Distribution under Heterogeneous Cluster Environment
Big data mining methods supports knowledge discovery on high scalable, high volume and high velocity data elements. The cloud computing environment provides computational and storage resources for the big data mining process. Hadoop is a widely used parallel and distributed computing platform for big data analysis and manages the homogeneous and heterogeneous computing models. The MapReduce fra...
متن کاملTowards the End-to-End Design for Big Data Management in the Cloud: Why, How, and When?
With the wide-scale adoption of cloud computing and with the explosion in the number of distributed applications and end-user devices, we are witnessing insatiable desire to build bigger-and-bigger systems that can serve hundreds of millions of end-users, are highly automated, and can collect enormous amounts of data in short periods of time. Often newer systems are implemented by integrating e...
متن کاملTitle : IEEE Transactions on Cloud Computing Title of Paper : Cross - cloud MapReduce for Big Data
MapReduce plays a critical role as a leading framework for big data analytics. In this paper, we consider a geodistributed cloud architecture that provides MapReduce services based on the big data collected from end users all over the world. Existing work handles MapReduce jobs by a traditional computation-centric approach that all input data distributed in multiple clouds are aggregated to a v...
متن کاملEvolving Databases for New-Gen Big Data Applications
The rising popularity of large-scale real-time analytics applications (real-time inventory/pricing, mobile apps that give you suggestions, fraud detection, risk analysis, etc.) emphasize the need for distributed data management systems that can handle fast transactions and analytics concurrently. Efficient processing of transactional and analytical requests, however, require different optimizat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017